Introduction to R

Session 4

Session Overview

  1. Inputs and Outputs
  2. Graphics
  3. Bonus: Advanced Graphics with ggplot

Today

Nalan Bastürk

  • Associate Professor at QE
  • Research interests: econometrics, Bayesian statisics, financial econometrics
  • Website
  • Sessions 4 and 5

Stephan Smeekes

  • Professor of Econometrics at QE
  • Research interests: econometrics, time series, high-dimensional statisics, bootstrap, macro- and climate econometrics
  • Website
  • Sessions 2, 3 and 4

Inputs and outputs of your R session

  • Common inputs / outputs of an R session are datasets or R scripts.
  • For this meeting we focus on datasets as inputs to the R session, loading data and saving data.
  • The file format (csv, Rdata, xls, stata …) and directory of the files are important keep in mind.
  • We will go over a few options, potential issues, and how to avoid the need to type in a long directory name when managing the input and output of the R session.

Working directories and R projects

  • Whenever we provide R with a file name, it can include the full path on the computer.
  • An alternative is to work on a specified directory.
  • Another alternative is to work within a ‘project’ that all paths are visible to the project scripts.
  • If we do not provide any path, R will use the current “working directory” for reading or writing files. It can be obtained by the command
  getwd()

Using the correct directory to get input / output of the R session

  • Navigating through the menus in RStudio is easy, (click and go) but requires using the menu every time the user runs the code.

  • Go to Session -> Set Working Directory. Two convenient options are:

    • Choose Directory…: Choose the directory yourself

    • To Source File Location: Set the working directory to the directory where your R Script (the source file) is saved

Using the correct directory to get input / output of the R session

  • An alternative is to use function setwd() at the beginning of your script. This line then has to be changed when the code runs in another machine.
 setwd("~/ownCloud (2)/Teaching/Rprogramming_UM/Rprogramming_UM")

Using the correct directory to get input / output of the R session

  • Recall: To set the working directory to the folder where your current R script is located, you can simply use:
library(this.path)
setwd(here()))
  • Recall: We could also explicitly make the function call from the library:
setwd(this.path::here()))

Types of input or data that can be loaded in R

R interacts with files in several ways.

  • You can load, save, import, or export a data file.
  • You can save a generated figure as a graphics file or store regression tables as text, spreadsheet, or LATEX tables.
  • You can load, save the full workspace (environment) you are working with to follow up another time.

Datasets can come in different formats.

  • Rdata files: Files that can directly
  • Other file formats (SPSS csv, xls, …) are also possible to load in R. This often requires the use of packages

Loading Rdata files

  • Rdata files are specific to R file formats.
  • They can store a single object or several objects.
  • These files are the easiest to manage as input or output in R, since they don’t require library calls.

Load climate data from RData format:

  • load function is used to load data in Rdata format.
  • load function loads all objects in the input Rdata file.
load("data/climate_Maas_Eind.Rdata")
print(summary(data_short))
       X             STNID               NAME               CTRY          
 Min.   : 87132   Length:2896        Length:2896        Length:2896       
 1st Qu.: 88219   Class :character   Class :character   Class :character  
 Median : 92226   Mode  :character   Mode  :character   Mode  :character  
 Mean   : 93638                                                           
 3rd Qu.: 96234                                                           
 Max.   :100154                                                           
 COUNTRY_NAME          ISO2C              ISO3C              LATITUDE    
 Length:2896        Length:2896        Length:2896        Min.   :50.91  
 Class :character   Class :character   Class :character   1st Qu.:50.91  
 Mode  :character   Mode  :character   Mode  :character   Median :51.18  
                                                          Mean   :51.18  
                                                          3rd Qu.:51.45  
                                                          Max.   :51.45  
   LONGITUDE       ELEVATION          BEGIN               END          
 Min.   :5.375   Min.   : 22.55   Min.   :19490101   Min.   :20240909  
 1st Qu.:5.375   1st Qu.: 22.55   1st Qu.:19490101   1st Qu.:20240909  
 Median :5.572   Median : 68.42   Median :19490101   Median :20240909  
 Mean   :5.572   Mean   : 68.42   Mean   :19490101   Mean   :20240909  
 3rd Qu.:5.770   3rd Qu.:114.30   3rd Qu.:19490101   3rd Qu.:20240909  
 Max.   :5.770   Max.   :114.30   Max.   :19490101   Max.   :20240909  
   YEARMODA              YEAR          MONTH             DAY       
 Length:2896        Min.   :2021   Min.   : 1.000   Min.   : 1.00  
 Class :character   1st Qu.:2021   1st Qu.: 4.000   1st Qu.: 8.00  
 Mode  :character   Median :2022   Median : 7.000   Median :16.00  
                    Mean   :2022   Mean   : 6.523   Mean   :15.71  
                    3rd Qu.:2023   3rd Qu.:10.000   3rd Qu.:23.00  
                    Max.   :2024   Max.   :12.000   Max.   :31.00  
      YDAY             TEMP       TEMP_ATTRIBUTES DEWP_ATTRIBUTES
 Min.   :  1.00   Min.   :-8.50   Min.   : 7.00   Min.   : 7.00  
 1st Qu.: 92.75   1st Qu.: 6.90   1st Qu.:24.00   1st Qu.:24.00  
 Median :183.00   Median :11.45   Median :24.00   Median :24.00  
 Mean   :183.10   Mean   :11.69   Mean   :23.88   Mean   :23.88  
 3rd Qu.:274.00   3rd Qu.:16.80   3rd Qu.:24.00   3rd Qu.:24.00  
 Max.   :366.00   Max.   :30.00   Max.   :24.00   Max.   :24.00  
 SLP_ATTRIBUTES     STP_ATTRIBUTES     VISIB_ATTRIBUTES WDSP_ATTRIBUTES
 Min.   : 0.00000   Min.   : 0.00000   Min.   : 7.00    Min.   : 7.00  
 1st Qu.: 0.00000   1st Qu.: 0.00000   1st Qu.:24.00    1st Qu.:24.00  
 Median : 0.00000   Median : 0.00000   Median :24.00    Median :24.00  
 Mean   : 0.03729   Mean   : 0.03729   Mean   :23.87    Mean   :23.88  
 3rd Qu.: 0.00000   3rd Qu.: 0.00000   3rd Qu.:24.00    3rd Qu.:24.00  
 Max.   :13.00000   Max.   :13.00000   Max.   :24.00    Max.   :24.00  
      MAX            I_FOG        I_RAIN_DRIZZLE     I_SNOW_ICE     
 Min.   :-5.00   Min.   :0.0000   Min.   :0.0000   Min.   :0.00000  
 1st Qu.:10.20   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00000  
 Median :16.00   Median :0.0000   Median :1.0000   Median :0.00000  
 Mean   :15.94   Mean   :0.1381   Mean   :0.6454   Mean   :0.04385  
 3rd Qu.:22.00   3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:0.00000  
 Max.   :39.50   Max.   :1.0000   Max.   :1.0000   Max.   :1.00000  
     I_HAIL           I_THUNDER       I_TORNADO_FUNNEL       ES       
 Min.   :0.000000   Min.   :0.00000   Min.   :0        Min.   :0.300  
 1st Qu.:0.000000   1st Qu.:0.00000   1st Qu.:0        1st Qu.:1.000  
 Median :0.000000   Median :0.00000   Median :0        Median :1.350  
 Mean   :0.004489   Mean   :0.07562   Mean   :0        Mean   :1.478  
 3rd Qu.:0.000000   3rd Qu.:0.00000   3rd Qu.:0        3rd Qu.:1.900  
 Max.   :1.000000   Max.   :1.00000   Max.   :0        Max.   :4.200  

Load climate data from RData format:

  • You can assign the loaded data to a new variable
data <- load("data/climate_Maas_Eind.Rdata")

Loading other formats of data in R

Option 1: Using menus within RStudio is the easiest (click and go) but requires using the menu every time the user runs the code.

Loading other formats of data in R

Option 1: Using menus within RStudio (cont’d)

Loading other formats of data in R

Option 1: Using menus within RStudio (cont’d)

Loading other formats of data in R

Option 1: Using menus within RStudio (cont’d)

Loading other formats of data in R

Advice for option 1:

  • Copy the command that appears after loading the data from the menus.

Loading other formats of data in R

Advice for option 1 (cont’d):

  • Paste the command on top of your script.
  • This way, next time you do not need the menu navigation.
library(readxl)
climate <- read_excel("data/climate.xlsx")
  • You can view the data by clicking on it in the `Environment’ at the top-right of the workspace.

General way of importing and exporting of other data formats

  • Using the correct libraries for different data formats can be tedious.
  • R package rio is very convenient for data import and export. It figures out the type of data format from the file name extension, e.g. .csv for CSV, .dta for Stata, or *.sav for SPSS data sets
  • For a complete list of supported formats, see help(rio).
  • It calls an appropriate package to do the actual importing or exporting.

Loading SPSS and other file types

library('rio')
import("data/climate.dta")

Loading csv files

import("data/climate.csv")

Outputs

  • Outputs work very similarly to the inputs above.
  • The most relevant outputs formats are the R output formats.
  • save() saves objects as an .RData file.
  • save.image() saves a selection of objects as an .RData file.

Exercise 4.1: Saving data

  • Save your current workspace using function save.image().
  • Save only one variable in the workspace using function save().
  • Make a list of two variables from data_short, and save this list using function save.

R Base Graphics

  • We will cover R base graphics.
  • Other alternatives include `ggplot2’…

To create plots with R’s standard graphics package, there are high-level and low-level plotting functions.

  • High-level functions generate a new graphic (and open a device).
  • Low-level functions add elements to an existing graphic.

Simple plots

plot(data_short$MAX)     ## Plotting a single variable

plot(x = data_short$YEAR, y = data_short$MAX)     ## Scatter plot

Multiple plots

par(mfrow = c(1,2)) # multiple plots in a row
plot(data_short$MAX)     ## Plotting a single variable
plot(x = data_short$YEAR, y = data_short$MAX)     ## Scatter plot

Functions calling methods

Notice that function plot() calls methods.

It will perform different operations depending on the class of the passed object. (We study the lm() function in detail in the next session!)

ols_result <- lm(MAX~YEAR, data = data_short)
plot(ols_result)

Creating and saving a graph

pdf("outputs/plot_data_short.pdf")                  
hist(data_short$MAX, breaks = 20)
dev.off()

Customizing Graphics

  • Adding points to an existing plot
  • Function `dev.off()’is called after all the plotting, to save the file and return control to the screen.
plot(data_short$MAX, type = "l") ## Lines
points(data_short$MAX)

Customizing Graphics

  • The plot() function takes several many arguments that can change the layout of the plots. See ?par for all graphical options; there are many!

  • Some examples:

    • col: color of lines / points
    • lty, lwd: Line type and thickness
    • pch: Point type (1-16)
    • main, sub: Title, subtitle
    • xlab, ylab: x and y axis labels
    • log, xlog and ylog for logarithmic scales
    • xlim, ylim: x and y axis limits (for overriding R’s default choices)
    • mfcol, mfrow: Multiple plots in one graphics window (column-wise/row-wise)

Low-Level Graphic Functions

  • lines: Draw lines
  • abline: Quickly add horizontal, vertical lines, and lines using equation \(y = bx + a\)
  • points: Add points
  • arrows: Add arrows
  • title: Add a title
  • legend: Add a legend
  • text: Add text at \((x,y)\) coordinates
  • mtext: Add text with positional specification like side=1,...,4

Exercise 4.2: Plot maximum temperatures for Maastricht

We want to visualize the daily maximum temperatures in the climate data data_short specifically for Maastricht. First, make a basic plot of variable and MAX then customise the plot in the following ways:

  1. The title of the X-axis should say ‘Year’, the title of the Y-axis ‘Maximum Temperature’.

  2. Make the plot a line plot with a blue line. (Hint: specifying the colour literally as "blue" works)

  3. Make the tick marks appear on the inside of the figure rather than the outside.

  4. Calculate the average temperature.

  5. Add a horizontal line with the average maximum temperature

You will need to consult the help file for this exercise; see this therefrom more as an exercise in how to navigate R’s help system, than an exercise in plotting (which we will cover in more detail later).

You may want to ask ChatGPT for help and then try to see if you could also have gotten the same answer yourself; it may not always give you the most straightforward answer though!

Manually saving R plots

  • Use the plot functions without creating a graph.
  • Use the `plots’ area to save image manually.
plot(x = data_short$YEAR, y = data_short$MAX)     

Different plot types

You can manually save graphs of several formats.

Best practice is to save a graph through a device such as pdf or similar:

  • pdf(): Adobe PDF (easily integrated into LaTeX).
  • svg(): Scalable Vector Graphics (commonly used on websites).
  • png(), jpeg(), tiff(), bmp(): Various bitmap formats.
jpeg("outputs/plot_data_short.jpeg")                  
plot(x = data_short$YEAR, y = data_short$MAX)   
dev.off()

Bonus: Advanved Graphics using ggplot

Exercise 4.3: Use other packages for plots

  • Make the same plot in R using package `ggplot2’
  • You may want to use ChatGBT since the syntax of `ggplot2’ is quite different from what we covered so far.